Skip to content

Support TiDB NullEQ pushdown#10946

Open
windtalker wants to merge 4 commits into
pingcap:masterfrom
windtalker:support_nulleq
Open

Support TiDB NullEQ pushdown#10946
windtalker wants to merge 4 commits into
pingcap:masterfrom
windtalker:support_nulleq

Conversation

@windtalker

@windtalker windtalker commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #5102

Problem Summary:

TiDB can push down NullEQ expressions, but TiFlash does not fully recognize and execute the pushed-down NullEQ semantics yet, including rough-set filtering in DeltaMerge.

What is changed and how it works?

support nulleq in tiflash
DeltaMerge: fix rough-set semantics for NullEQ

This PR adds TiDB NullEQ function support in TiFlash, wires NullEQ pushdown handling, adds execution tests, and fixes DeltaMerge rough-set/min-max index semantics for NullEQ.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Support TiDB NullEQ pushdown in TiFlash.

Summary by CodeRabbit

  • New Features

    • Added TiDB null-safe equality (<=>) end-to-end: tidbNullEQ function support and DeltaMerge filtering via the null_equal rough filter.
    • Extended recognition and handling across multiple data types, including vector formats.
  • Bug Fixes

    • Improved nullable min/max evaluation for NULL-safe equality, including correct behavior for mixed NULL/non-NULL packs.
    • Fixed nullable string min/max extraction used by these checks.
  • Tests

    • Added function, parser, and min/max index coverage for tidbNullEQ/null_equal, including negation and collator forwarding.

 

Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
 

DeltaMerge: fix rough-set semantics for NullEQ

- add a dedicated `NullEqual` rough-set operator instead of lowering to `Equal`
- teach minmax index to evaluate nullable `NullEQ` with null-safe semantics
- keep the old minmax-index compatibility path conservative when it cannot distinguish pure-null from mixed packs
- add regression coverage for nullable packs and old minmax-index compatibility
- update the new NullEQ test files introduced in pingcap#10726 to use copyright year 2026
@ti-chi-bot ti-chi-bot Bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 1, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign solotzg for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 1, 2026
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9d127196-a7b0-4580-bc60-742050684ed1

📥 Commits

Reviewing files that changed from the base of the PR and between 31474b8 and c52b853.

📒 Files selected for processing (1)
  • dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp

📝 Walkthrough

Walkthrough

This PR adds TiDB NullEQ (<=>) support across coprocessor function mapping, function execution, DeltaMerge filter parsing, and MinMaxIndex rough-checking, with new unit and integration coverage for nullable and non-nullable cases.

Changes

NullEQ Push-Down Implementation

Layer / File(s) Summary
DAGUtils NullEQ signature mapping
dbms/src/Flash/Coprocessor/DAGUtils.cpp, dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp
Maps TiDB NullEQ* scalar signatures to tidbNullEQ and verifies the mapping in a new gtest.
tidbNullEQ function implementation and tests
dbms/src/Functions/FunctionsComparison.cpp, dbms/src/Functions/tests/gtest_tidb_null_eq.cpp
Adds FunctionTiDBNullEQ, registers it, and tests NULL-safe equality behavior across nullable inputs, constants, and collators.
NullEqual RS operator and parser dispatch
dbms/src/Storages/DeltaMerge/Filter/NullEqual.h, dbms/src/Storages/DeltaMerge/Filter/RSOperator.{cpp,h}, dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.{cpp,h}, dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp
Introduces the NullEqual rough-set operator, adds parser routing for NullEQ* signatures, and covers the lowering behavior in parser tests.
MinMaxIndex checkNullEqual implementation
dbms/src/Storages/DeltaMerge/Index/MinMaxIndex.{cpp,h}, dbms/src/Storages/DeltaMerge/Index/tests/gtest_dm_minmax_index.cpp
Adds nullable null-equality rough-check support in MinMaxIndex, updates string boundary handling, and extends index tests for NullEQ and related parsing cases.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related PRs

  • pingcap/tiflash#10769: Overlaps with the same null_equal DeltaMerge pipeline, including parser routing and MinMaxIndex::checkNullEqual.

Suggested reviewers: JaySon-Huang, gengliqi

Poem

I hop through bytes on silent paws,
And <=> now knows the NULL-safe laws.
Min-max checks and parsers sing,
A tidy tidbNullEQ spring!
🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding TiDB NullEQ pushdown support.
Description check ✅ Passed The description follows the template with problem summary, change summary, checklist, side effects, and release note filled in.
Linked Issues check ✅ Passed The changes implement NullEq pushdown and add multiple tests, including a dedicated DM parser test and several unit tests.
Out of Scope Changes check ✅ Passed The modified files all support NullEQ pushdown semantics, parser handling, or related tests, with no clear unrelated additions.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/DeltaMerge/Index/tests/gtest_dm_minmax_index.cpp`:
- Around line 2448-2455: The DAGQueryInfo construction in
gtest_dm_minmax_index.cpp is using the wrong overload and is missing required
parameters. Update the std::make_unique<DAGQueryInfo> call in the minmax index
test to match the full constructor signature used by ParseIn, including the
missing fts_query_info and used_indexes arguments in the correct order. Use the
DAGQueryInfo constructor and the surrounding ParseIn-style argument sequence as
the reference for locating and fixing this call.

In `@dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp`:
- Around line 95-103: The DAGQueryInfo construction in the null-eq parser test
is missing required arguments, so update the std::make_unique<DAGQueryInfo> call
to pass both fts_query_info and empty_used_indexes in the correct position. Use
the existing local values and match the constructor signature in DAGQueryInfo
exactly so the test builds against the current API.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fcd47c8c-fa93-4f5a-8ca5-a5a90c7d977b

📥 Commits

Reviewing files that changed from the base of the PR and between 63cff9a and e53c6b9.

📒 Files selected for processing (13)
  • dbms/src/Flash/Coprocessor/DAGUtils.cpp
  • dbms/src/Flash/Coprocessor/tests/gtest_tidb_null_eq_func.cpp
  • dbms/src/Functions/FunctionsComparison.cpp
  • dbms/src/Functions/tests/gtest_tidb_null_eq.cpp
  • dbms/src/Storages/DeltaMerge/Filter/NullEqual.h
  • dbms/src/Storages/DeltaMerge/Filter/RSOperator.cpp
  • dbms/src/Storages/DeltaMerge/Filter/RSOperator.h
  • dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.cpp
  • dbms/src/Storages/DeltaMerge/FilterParser/FilterParser.h
  • dbms/src/Storages/DeltaMerge/Index/MinMaxIndex.cpp
  • dbms/src/Storages/DeltaMerge/Index/MinMaxIndex.h
  • dbms/src/Storages/DeltaMerge/Index/tests/gtest_dm_minmax_index.cpp
  • dbms/src/Storages/DeltaMerge/tests/gtest_dm_filter_parser_nulleq.cpp

Comment thread dbms/src/Storages/DeltaMerge/Index/tests/gtest_dm_minmax_index.cpp
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
@ti-chi-bot

ti-chi-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@windtalker: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test c52b853 link true /test pull-unit-test
pull-unit-next-gen c52b853 link true /test pull-unit-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yy782

yy782 commented Jul 3, 2026

Copy link
Copy Markdown

Hi @windtalker and maintainers,
I noticed that PR #10946 also addresses the same issue (#5102). I was assigned to this issue and have been working on it independently. I've just submitted my implementation in this PR#10954.
Since we both have PRs for the same issue, I'd like to understand how we should resolve this overlap. I'm open to collaboration, but I also want to ensure that my work on this assigned issue is recognized.
Could you please advise on the best way forward? Should we compare implementations and consolidate, or are there other preferences from the maintainers?
Thanks!

@JaySon-Huang

Copy link
Copy Markdown
Contributor

Hi @yy782 , thanks a lot for your interest in TiFlash and for working on this issue.

Sorry that we did not notice earlier that this issue had been assigned to you, while another implementation was already ongoing on our side. This caused two overlapping PRs for the same issue, and I apologize for the confusion.

After reviewing both PRs, I would prefer to use #10946 as the base implementation. The main reason is that #10946 introduces a dedicated tidbNullEQ function and handles the NULL-safe equality semantics directly in the function implementation, including nullable inputs, NULL-only columns, nullable constants, and collator forwarding. It also adds a dedicated DeltaMerge MinMaxIndex::checkNullEqual path, which gives more precise rough-set behavior for nullable packs and is easier to maintain than reusing normal Equal semantics for NullEQ.

That said, your PR #10954 has very valuable test coverage, especially the richer type-combination tests and the fullstack end-to-end test. I think those tests would be a good supplement to #10946.

So my suggestion is to keep #10946 as the core implementation, and incorporate the additional test coverage from #10954. We will make sure your contribution is properly recognized when doing that. Does this sound reasonable to you?

@yy782

yy782 commented Jul 3, 2026

Copy link
Copy Markdown

Thanks @JaySon-Huang! I'm happy to collaborate. Please let me know how you'd like me to help integrate the tests.

@JaySon-Huang

Copy link
Copy Markdown
Contributor

Hi @yy782,

Thanks for being willing to collaborate.

I think a good next step would be to use #10946 as the base implementation. You can first run:

gh pr checkout 10946

This will check out windtalker:support_nulleq locally. Then, based on the current commit of that branch, please port the useful test coverage from your PR, mainly from these files:

  • dbms/src/Functions/tests/gtest_nulleq.cpp
  • dbms/src/Storages/DeltaMerge/Index/tests/gtest_dm_minmax_index.cpp
  • tests/fullstack-test/expr/nulleq.test

A few notes:

  • In Support TiDB NullEQ pushdown #10946, the registered function name is tidbNullEQ, not nullEq, so the function tests should call tidbNullEQ.
  • Please try to keep your broader type-combination coverage, since that is a valuable supplement to Support TiDB NullEQ pushdown #10946.
  • Please make sure the related unit and fullstack tests can pass locally after the changes.

After that, you can either force-push the updated changes to your current feature/null-eq-pushdown branch, or create a new branch and open a new PR based on #10946.

Comment on lines +220 to +223
private:
TiDB::TiDBCollatorPtr collator = nullptr;
std::shared_ptr<FunctionEquals> equals_function = std::make_shared<FunctionEquals>();
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we can remove the collator from FunctionTiDBNullEQ

Suggested change
private:
TiDB::TiDBCollatorPtr collator = nullptr;
std::shared_ptr<FunctionEquals> equals_function = std::make_shared<FunctionEquals>();
};
private:
std::shared_ptr<FunctionEquals> equals_function = std::make_shared<FunctionEquals>();
};

Comment on lines +46 to +50
void setCollator(const TiDB::TiDBCollatorPtr & collator_) override
{
collator = collator_;
equals_function->setCollator(collator_);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void setCollator(const TiDB::TiDBCollatorPtr & collator_) override
{
collator = collator_;
equals_function->setCollator(collator_);
}
void setCollator(const TiDB::TiDBCollatorPtr & collator_) override
{
equals_function->setCollator(collator_);
}

extern const int LOGICAL_ERROR;
} // namespace ErrorCodes

class FunctionTiDBNullEQ : public IFunction

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class FunctionTiDBNullEQ : public IFunction
/// TiDB null-safe equal (`<=>`, mapped as `tidbNullEQ`).
///
/// Difference from `equals`:
/// - `equals` only compares nested values and does not interpret NULL markers. For nullable
/// inputs, rows with NULL may still get 0/1 from placeholder nested data instead of SQL `=`
/// semantics (where `NULL = x` is unknown/NULL).
/// - `tidbNullEQ` applies TiDB `<=>` rules: `NULL <=> NULL` => 1, `NULL <=> x` => 0, and defers
/// to `equals` only when neither side is NULL. The result is always non-nullable UInt8.
class FunctionTiDBNullEQ : public IFunction

Comment on lines +493 to +494
template <typename T>
RSResults MinMaxIndex::checkNullableNullEqualImpl(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T>
RSResults MinMaxIndex::checkNullableNullEqualImpl(
/// Rough-check `col <=> value` for a nullable column using per-pack min/max.
/// `value` must be non-null; `col <=> NULL` is handled by `checkIsNull` before reaching here.
///
/// Unlike `checkNullableCmp` (SQL `=`), null-safe equal treats `NULL <=> value` as definitely false
/// rather than unknown, so we never tag results with `has_null` (no `*Null` RSResult variants).
template <typename T>
RSResults MinMaxIndex::checkNullableNullEqualImpl(

Comment on lines +502 to +521
RSResults results(pack_count, RSResult::Some);
const auto & minmaxes_data = toColumnVectorData<T>(column_nullable.getNestedColumnPtr());
for (size_t i = start_pack; i < start_pack + pack_count; ++i)
{
if (details::minIsNull(null_map, i))
{
if (has_null_marks[i] && !has_value_marks[i])
results[i - start_pack] = RSResult::None;
continue;
}

auto min = minmaxes_data[i * 2];
auto max = minmaxes_data[i * 2 + 1];
auto value_result = RoughCheck::CheckEqual::check<T>(value, type, min, max);
if (has_null_marks[i] && value_result == RSResult::All)
results[i - start_pack] = RSResult::Some;
else
results[i - start_pack] = value_result;
}
return results;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RSResults results(pack_count, RSResult::Some);
const auto & minmaxes_data = toColumnVectorData<T>(column_nullable.getNestedColumnPtr());
for (size_t i = start_pack; i < start_pack + pack_count; ++i)
{
if (details::minIsNull(null_map, i))
{
if (has_null_marks[i] && !has_value_marks[i])
results[i - start_pack] = RSResult::None;
continue;
}
auto min = minmaxes_data[i * 2];
auto max = minmaxes_data[i * 2 + 1];
auto value_result = RoughCheck::CheckEqual::check<T>(value, type, min, max);
if (has_null_marks[i] && value_result == RSResult::All)
results[i - start_pack] = RSResult::Some;
else
results[i - start_pack] = value_result;
}
return results;
/// Default to Some: conservative when min/max cannot prove a tighter bound.
RSResults results(pack_count, RSResult::Some);
const auto & minmaxes_data = toColumnVectorData<T>(column_nullable.getNestedColumnPtr());
for (size_t i = start_pack; i < start_pack + pack_count; ++i)
{
/// Min slot is NULL on indexes built before v6.4.0. For a pure-NULL pack,
/// `NULL <=> value` is false for every row, so the pack can be skipped.
if (details::minIsNull(null_map, i))
{
if (has_null_marks[i] && !has_value_marks[i])
results[i - start_pack] = RSResult::None;
continue;
}
auto min = minmaxes_data[i * 2];
auto max = minmaxes_data[i * 2 + 1];
auto value_result = RoughCheck::CheckEqual::check<T>(value, type, min, max);
/// Non-null values may all equal `value`, but NULL rows still make `col <=> value` false.
/// Downgrade All => Some so the pack is read and filtered row by row.
if (has_null_marks[i] && value_result == RSResult::All)
results[i - start_pack] = RSResult::Some;
else
results[i - start_pack] = value_result;
}
return results;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement NullEq function push down

3 participants